home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Collection of Tools & Utilities
/
Collection of Tools and Utilities.iso
/
asmutil
/
asm_n_z.zip
/
SOUNDEXA.ASM
< prev
next >
Wrap
Assembly Source File
|
1988-03-28
|
15KB
|
281 lines
TITLE SOUNDEXA
PAGE ,132
;Author : bp-programs, Kalamazoo, Michigan
;Date: Jan 88, for Clipper Summer 87
;Source Code protected by United States Copyright Law
;Permission given for code to be incorporated in other programs by author
;Syntax: SOUNDEXA(string[,filler])
;The soundex code is useful to look up names where you aren't sure of the
;spelling. Codes for similar sounding names are generally (but NOT always)
;close together. The code has the format LETTER-DIGIT-DIGIT-DIGIT. LETTER is
;simply the upper case first letter of the name. DIGITs are derived from the
;translation table below. Empty positions are NOT translated. If there are
;two or more letters with the same code following each other in the name, only
;ONE code number is used. 'Schmidt' is 'S530', not 'S253' or 'S533'. If there
;are more than three code numbers, the extra ones aren't used. If there are
;fewer, the code is padded with zeros. (But see about FILLER below).
;Soundex ABCDEFGHIJKLMNOPQRSTUVWXYZ
;Translation Table: 123 12 22455 12623 1 2 2
;SOUNDEXA is an assembly language implementation of the soundex code. It
;follows my interpretation of the algorithm found on pages 392/393 of Knuth's
;book 'Sorting and Searching', volume 3 of "The Art of Computer Programming".
;It does NOT return the same code as the soundex routine in examplec.c
;(SOUNDEXC) distributed with Clipper Summer 87 or the Rettig soundex routine
;distributed with Clipper Autumn 86 in extenddb.prg (SOUNDEXD).
;The main differences among the three implementations are listed below.
; SOUNDEXA SOUNDEXC SOUNDEXD
; ---------------------- --------------------- ----------------
;Format A999 A999 A9999
;Dupes Skips ltrs generating Skips identical ltrs Skips duplicate
; the same code which are adjacent in original code numbers even
; immediately adjacent in text if not adjacent in
; original text original text
;Null 1. Null string 1. Null string 1. Null string
;Returns 2. Completely non-alpha 2. Non-alpha/non
; string space characters
; except first char
;Fault 1. Ltrims leading non- 1. Does not trim, uses 1. Does not trim,
;Tolerance alpha characters non-alpha as lead uses any char
; 2. Skips intermediate 2. Aborts with non- 2. Skips inter-
; non-alpha characters alpha/non-space mediate non-
; except first char alpha chars
;Speed 3 secs/5000 repeats 9 secs/5000 repeats 90 secs/5000 repts
;I believe, of course, that SOUNDEXA is the 'best' implementation because
;it's closest to Knuth's algorithm, most fault tolerant, fastest (and also
;smallest, by the way) and the most FLEXIBLE. More about this below.
;Knuth's algorithm uses 0s (character zero) to fill trailing empty slots.
;This makes sense when you're constructing an index, such as
; INDEX ON SOUNDEXA(LASTNAME) TO NAMX
;However, when you're SEEK/LOCATEing with SOUNDEX you generally want to find
;all likely candidates and want to make sure that you don't miss any. You'd
;rather find a few wrong ones than miss a single right one. In that case
;you want to include even partial matches, such as
; LOCATE ALL FOR TRIM(SOUNDEXA(PART_NAME))
;SOUNDEXA allows you to select between two fillers, spaces or '0'. Even
;though zeros are 'standard', I find spaces more flexible and have made them
;the default. By specifying a second argument SOUNDEXA(LASTNAME,FILLER) once,
;you change the state of the routine. If FILLER is a '0' (as a character, not
;a number), all future calls to SOUNDEXA will use zeros for filling. If
;FILLER is any other character (or even a null string), SOUNDEXA will use
;spaces in the future. If there isn't a second argument, SOUNDEXA will use
;what you specified before or the default. If you prefer zeros as the default,
;change the FILLER DB to '0' in the DATASG.
;===================================================
EXTRN __PARINFO:FAR ;Clipper EXTEND routine, tells how many arguments
EXTRN __PARC:FAR ;Clipper EXTEND routine, gets a character argument
EXTRN __RETC:FAR ;Clipper EXTEND routine, returns a character value
SX_LENGTH EQU 4 ;Length of soundex code
DGROUP GROUP DATASG ;Ties this segment to the other data segments
;of Clipper. DS points to this DGROUP when
;we arrive in the assembly routine
DATASG SEGMENT WORD PUBLIC 'DATA' ;All PUBLIC segments with the name DATASG
;will be combined by the linker. All segments
;with the class 'DATA' will be adjacent to
;each other. WORD means that the segment
;starts on an even byte, which can sometimes
;be minutely faster in an 8086/80286 machine.
SOUNDEX DB SX_LENGTH DUP (?) ; Space for SOUNDEX result
DB 00 ; Terminator byte
;Strings in C and Clipper are terminated by a NULL (or NUL or
;NIL, it all means the same thing). There is no length byte
;or word as in BASIC or Turbo Pascal.
FILLER DB ' ' ; Filler byte for padding of SOUNDEX, can be
; space (default) or '0'
; Translate table from UC letters to SOUNDEX codes
; Omitted letters return NULL
; 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
; ' 123 12 22455 12623 1 2 2'
TRANSLATE db 0,'123',0,'12',0,0,'22455',0,'12623',0,'1',0,'2',0,'2'
DATASG ENDS
;==================================================
;==================================================
_PROG SEGMENT BYTE PUBLIC 'CODE' ;All PUBLIC segments with
;the name _PROG will be com-
;bined, all segments with the
;class 'CODE' will be
;adjacent. BYTE means that
;the segments will be aligned
;(stuck together) without any
;padding.
ASSUME CS:_PROG, DS:DGROUP, ES:NOTHING ;This is the way the segment
;registers are set up when we
;arrive here from Clipper.
PUBLIC SOUNDEXA ;Used in linking to Clipper, lets Clipper know
;where this routine is.
SOUNDEXA PROC FAR ;The name of our routine (procedure)
PUSH BP ;The Clipper extend documentation on disk
PUSH DI ;says that we have to save registers
PUSH SI ;BP, DI, SI, ES and DS. We are not
PUSH ES ;disturbing BP, so we may not have to save it.
PUSH DS ;But the Clipper routines __PARINFO, __PARC
;and __RETC may do so, we don't know.
;Ensure null string in case of missing argument or no letters
;We do this by moving a NULL byte in the first place of the
;SOUNDEX code. It will be overwritten if there's no error.
MOV BYTE PTR DGROUP:[SOUNDEX], 0
SUB AX, AX ;Faster and smaller than MOV AX, 0
PUSH AX
CALL __PARINFO ;Find out how many arguments passed
ADD SP, 2 ;Clean up stack. C routines, unlike BASIC
;or Pascal do NOT clean up the stack.
CMP AX, 1 ;Is there 1 argument?
JE M